feat/add-support-qwen3 by fengzz-coding · Pull Request #17 · FLock-io/FLock-validator

fengzz-coding · 2026-03-02T13:35:59Z

No description provided.

…t/add-support-qwen3

nickcom007 · 2026-03-20T01:06:27Z

validator/modules/llm_judge/__init__.py

-                                role in ["user", "assistant", "function_call"]
-                                and content
-                            ):
+                            if not content:


1. Reference extraction can be misaligned with the actual prompt

You extract the reference using:

last_msg = conversations[-1]

but you truncate the processed messages with:

conversation_to_process = conversation_to_process[:-1]

Problem:

conversations = raw input

conversation_to_process = filtered + transformed version

If any messages were:

skipped (empty content),

transformed (e.g. function_call → assistant),

or failed parsing,

then the “last raw message” may not match the “last processed message”.

You may remove one message from the prompt, but use a different one as the reference.

2. function_call reference is raw JSON string (may not match your evaluation goal)

When the last message is a function_call, you do:

reference_response = last_msg["content"]

This gives you something like:

{"name":"get_weather","arguments":{"city":"Toronto"}}

So your reference is:

a raw JSON string, not

a structured tool call, nor

a natural language answer

This is only correct if your evaluation expects:

exact string match of the function call JSON

Otherwise it may be inconsistent with:

how your template represents tool calls (tool_calls structure)

or how your model outputs them

3. Tool call ↔ observation matching is simplified (not robust)

You assign each observation to the most recent tool call:

for prev_msg in reversed(conversation_to_process): if prev_msg.get("role") == "assistant" and prev_msg.get("tool_calls"): tool_call_id = prev_msg["tool_calls"][0]["id"] break

This assumes:

one tool call at a time

one observation per call

strictly sequential flow

Works fine for simple ReAct-style traces like:

assistant → tool_call tool → observation assistant → next step

But breaks or becomes ambiguous if:

multiple tool calls in one assistant message

multiple observations

parallel or interleaved calls

add support qwen3

a213598

fengzz-coding requested a review from nickcom007 March 2, 2026 13:35

fengzz-coding added 3 commits March 11, 2026 09:05

update dependencies

ea5f3fa

only support for qwen model

46c9983

Merge branch 'main' into feat/add-support-qwen3

c36db98

fengzz-coding requested review from nickcom007 and removed request for nickcom007 March 12, 2026 04:05

fengzz-coding added 6 commits March 17, 2026 16:48

delete template.py

cbfeac9

Merge remote-tracking branch 'origin/feat/add-support-qwen3' into fea…

a21c3ec

…t/add-support-qwen3

add hf_tokenizer mapping

13df7e5

only keep qwen3.5 model

be3280d

fix qwen3.5 model function calling bug.

ff452a5

fix qwen3.5 model function calling bug.

4ec68cf

nickcom007 reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/add-support-qwen3#17

feat/add-support-qwen3#17
fengzz-coding wants to merge 10 commits intomainfrom
feat/add-support-qwen3

fengzz-coding commented Mar 2, 2026

Uh oh!

nickcom007 Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fengzz-coding commented Mar 2, 2026

Uh oh!

nickcom007 Mar 20, 2026

Choose a reason for hiding this comment

1. Reference extraction can be misaligned with the actual prompt

2. function_call reference is raw JSON string (may not match your evaluation goal)

3. Tool call ↔ observation matching is simplified (not robust)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. `function_call` reference is raw JSON string (may not match your evaluation goal)